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ABSTRACT 

The  concurrent  manipulation  of  a  binary  search  tree  is  considered  in  this  paper.  The 
systems  presented  can  support  any  number  of  concurrent  processes  which  perform 
searching,  insertion,  deletion,  and  rotation  (reorganization)  on  the  tree,  but  allow  any  process 
to  lock  only  a  constant  number  of  nodes  at  any  time.  Also,  in  the  systems,  searches  are 
essentially  never  blocked.  The  concurrency  control  techniques  introduced  in  the  paper 
include  the  use  of  special  nodes  and  pointers  to  redirect  searches,  and  the  use  of  copies  of 
sections  of  the  tree  to  introduce  many  changes  simultaneously  and  therefore  avoid 
unpredictable  interleaving.  Methods  developed  in  this  paper  may  provide  new  insights  to 
other  problems  in  the  area  of  concurrent  database  manipulation. 
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1.  Introduction 

As  the  construction  of  large  multiprocessors  (such  as  Cm*  [22])  becomes  practicable,  much 
thought  has  been  given  to  methods  of  exploiting  these  powerful  computers.  One  natural  and 
important  application  is  the  use  of  the  multiprocessing  power  to  manipulate  large  data  bases. 
Multiprocessors  might  be  used  to  simultaneously  service  the  needs  of  several  database  users, 
or  to  reduce  the  time  necessary  for  a  single  complex  task.  In  order  to  gain  experience  in  this 
direction,  we  studied  the  use  of  multiprocessors  in  manipulating  a  simple  data  structure 
known  as  a  binary  search  tree.  As  a  result,  we  designed  systems  that  could  support  any 
number  of  the  concurrent  operations  of  insertion,  deletion,  and  reorganization  (specifically, 
rebalancing)  on  the  tree.  Although  the  systems  are  designed  for  implementation  on 
multiprocessors,  they  are  also  useful  for  implementation  on  uniprocessors  that  support 
multiprogramming.  This  paper  presents  these  systems  and  discusses  the  ideas  behind  them. 

Some  General  Techniques  Used 

One  problem  often  encountered  by  concurrent  systems  is  the  necessity  of  doing  a  set  of 
operations  simultaneously  or  indivisibly  for  correctness  reasons.  This  occurs  where  any 
partial  completion  of  the  set  may  lead  to  a  temporary  inconsistency  in  the  data  structure.  To 
solve  this  dilemna,  we  introduce  the  idea  of  "copies"  of  sections  of  the  binary  search  tree. 
These  copies  are  to  be  created  outside  the  tree,  modified  as  appropriate  to  reflect  the  result 
of  the  set  of  operations  on  the  tree,  and  then  introduced  inlo  the  tree  structure  with  a  single, 
indivisible  operation.  This  technique  may  be  used  to  simultaneously  replace  all  of  the  pieces 
of  an  old  version  of  that  section  of  the  tree,  effectively  performing  many  modifications 
simultaneously. 

In  using  the  copying  technique,  a  substantial  amount  of  work  is  done  before  the  results  of 
that  work  are  introduced  into  the  data  structure.  Conversely,  we  also  use  the  technique  of 
"postponement:"  delaying  any  work  that  need  not  be  done  immediately.  Each  process  only 
does  "what  it  has  to  do."  Other  processes  can  perform  the  postponed  work  separately.  With 
this  technique,  the  unique  multiprocessing  capability  supplied  by  multiprocessors  is  utilized. 
This  is  particularly  advantageous  in  the  case  where  work  cannot  be  done  immediately,  and  a 
process  would  have  had  to  wait;  instead,  it  can  relegate  the  work  to  another  process,  to  be 
done  when  feasible. 

Another  difficulty  generally  encountered  in  asynchronous  concurrent  processing  is  that  the 
actions  of  one  process  may  serve  to  invalidate  some  decisions  made  by  another  process.  It 
may  be  the  case  that  a  process  will  see  the  tree  "change  out  from  under  it."  For  this 
possibility,  we  provide  a  recovery  mechanism  for  "confused"  processes,  in  the  form  of  "back 
pointers"  that  redirect  processes  whose  position  in  the  tree  has  been  invalidated  by  the 
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actions  of  other  processes.  These  back  pointers  are  attached  to  "blue  nodes"  which  signal 
the  process  that  it  is  lost  in  the  first  place. 

In  designing  algorithms  to  use  these  techniques,  we  tried  to  keep  the  general  design  of  the 
algorithms  simple  and  efficient.  For  example,  our  locking  scheme  uses  no  reader  locks;  nor  do 
we  permit  any  process  to  exclusively  lock  a  node.  We  only  use  writer-exclusion  locks  that 
prevent  simultaneous  update  of  a  node  by  more  than  one  process.  The  locking  scheme  itself 
is  also  quite  simple.  No  complex  queueing  mechanism  is  required  to  administrate  lock 
requests,  on  whose  order  ihe  well-being  of  the  system  might  depend.  In  addition,  the 
number  of  nodes  which  any  process  can  lock  at  one  given  time  is  bounded  by  a  very  small 
constant,  placing  a  tight  bound  on  the  degree  to  which  any  single  process  can  interfere  with 
others. 

Utilizing  the  ideas  mentioned  above,  we  build  a  set  of  tree-mutating  processes.  In  addition, 
we  study  garbage  collection  mechanisms  that  make  available  for  reuse  nodes  that  have  been 
deleted  from  the  tree.  While  garbage  collection  processes  are  not  specifically  tree  mutators, 
they  are  necessary  for  the  completeness  and  correct  functioning  of  the  systems.  We 
illustrate  two  such  mechanisms:  a  simple  version  with  a  single  garbage  collection  process, 
and  a  version  that  allows  concurrent  garbage  collection  (many  collectors)  to  operate  at  the 
same  time  as  tree  mutators.  Here  we  note  another  illustration  of  the  idea  of  postponement: 
it  is  generally  unnecessary  to  collect  garbage  immediately  after  if  is  generated. 

Developing  these  algorithms  has  strengthened  our  belief  that  concurrent  algorithms  are  for 
the  most  part  far  less  intuitive  than  sequential  algorithms.  This  is  one  reason  that  much 
attention  has  been  given  recently  to  the  proof  of  the  correctness  of  concurrent  programs 
(following  in  the  footsteps  of  the  work  on  verification  of  sequential  programs).  We  offer 
verifications  of  our  systems,  and  include  a  sketch  of  the  correctness  proof  for  the  concurrent 
garbage  collector. 

Substantial  work  has  been  done  on  developing  concurrent  algorithms  for  the  manipulation 
of  B-trees,  which  are  a  popular  data  storage  structure,  especially  for  large  collections  of  data 
(see  appendix  II).  These  algorithms  have  steadily  improved,  using  as  a  measure  the  size  of 
the  B-tree  region  locked  by  a  process.  An  adaptation  of  the  results  in  the  present  paper 
allows  yet  another  improvement  to  B-tree  algorithms  along  these  lines  (see  [16]).  Further 
generalizations  of  the  ideas  presented  here  may  suggest  highly  concurrent  algorithms  for 
manipulating  other  data  structures. 

Outline  of  the  Paper 

In  Section  2  we  define  the  concurrent  manipulation  problem  studied  in  the  paper,  state  our 
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assumptions  and  set  up  the  definitions  to  be  used  in  our  correctness  proofs.  In  the  next 
three  sections  (3,  4,  and  5),  we  develop  our  concurrent  systems.  In  Section  6  we  propose  a 
simple  garbage  collection  mechanism.  A  summary  and  concluding  remarks  are  given  in  Section 
7.  In  appendix  I  we  elaborate  upon  a  concurrent  garbage  collection  mechanism.  In  appendix 
II  we  give  some  background  for  this  problem  area,  and  describe  related  work  that  has  been 
done.  In  appendix  III  we  offer  a  natural  correctness  criterion  for  concurrent  search  systems, 
and  argue  that  the  properties  we  have  proven  for  our  systems  together  constitute  a 
sufficient  condition  for  that  criterion. 
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2.  The  Problem,  Basic  Definitions,  and  Assumptions 

As  mentioned  above,  the  problem  considered  in  this  paper  is  the  design  of  systems  that 
can  support  concurrent  manipulations  on  a  binary  search  tree.  (For  a  general  discussion  of 
binary  search  trees,  see,  e.g.,  [10]).  We  hope  to  achieve  maximum  concurrency  without 
impairing  the  correctness  of  the  systems.  In  the  following,  we  shall  first  describe  the  data 
structure  shared  by  all  of  the  concurrent  processes,  and  then  define  the  problem  more 
precisely. 


2.1  The  Data  Structure 

The  data  structure  consists  of  a  directed  graph  and  a  queue,  called  GC-queue.  The  binary 
tree  is  embedded  in  the  directed  graph.  Let  the  nodes  of  the  graph  be  labelled  by  integers 
1,...,M,  and  the  node  labelled  by  n  be  in  memory  location  n  for  all  n»lr..,M.  Node  n  (or  simply 
n)  is  used  to  refer  to  either  the  node  labelled  by  n  or  the  pointer  to  that  node,  depending  on 
the  context.  For  the  purpose  of  this  paper,  we  assume  that  each  node  contains  six  fields:  a 
left  pointer  field,  a  right  pointer  field,  a  back  pointer  field,  a  color  field,  a  value  field  and  a 
lock  field.  A  pointer  field  contains  either  a  pointer  to  a  node  or  the  null  pointer  "X."  The 
value  field  contains  a  value  from  a  linearly  ordered  set.  The  color  field  contains  the  color  of 
the  node,  which  is  "white"  or  "blue":  nodes  on  the  binary  tree  are  always  white  and  blue 
nodes  are  never  on  the  tree.  (The  use  of  this  notation  was  motivated  by  the  availability  of 
colored  chalk.)  The  lock  field  of  a  node  is  set  by  a  process  in  order  to  gain  the  right  to 
modify  that  node.  Only  one  process  at  a  time  may  have  any  given  node  locked. 

The  pointer  contained  in  the  left,  right  or  back  pointer  field  of  node  n  is  called  the  left, 
right,  or  back  pointer  of  n  and  is  denoted  by  n.left,  n.right  or  aback  respectively.  Similarly, 
the  contents  in  the  color,  value,  and  lock  fields  of  node  n  are  denoted  by  n.color,  n.value,  and 
n.lock,  respectively.  The  topology  of  the  graph  is  determined  by  the  pointers  of  the  nodes  in 
the  graph.  Let  m  and  n  be  any  two  nodes.  If  m.left  (respectively,  m.right,  m.back)  »  n,  we 
say  that  n  is  the  left  (right,  back)  son  of  m  and  that  m  points  to  n  through  the  left  (right, 
back)  pointer  of  m.  There  are  two  special  nodes  denoted  by  ROOT  and  FREE.  Node  ROOT 
corresponds  to  the  root  of  the  binary  search  tree.  It  is  assumed  that  ROOT.value  -  "infinity," 
which  is  a  value  greater  than  any  value  one  can  search  for.  Node  FREE  points  (through  its 
left  pointer)  to  the  first  node  of  a  list,  called  the  free  list  (cf.  Fig.  2-1),  which  is  a  sequence  of 
any  number  of  nodes  nj,  r>2, ....  n^,  satisfying  the  following  properties: 

FI.  FREE.Ieft«nj,  nR.right-X. 

F2.  For  lsi<k,  nj.right*nj+j. 

F3.  For  lsisk,  nj.color-white. 

F4.  FREE.right-nK. 
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12  k-1  k 


Figure  2-1:  The  FREE  node  and  free  list. 

Garbage  (blue)  nodes  are  nodes  that  have  been  deleted  from  the  tree.  Garbage  nodes  are 
always  inserted  into  the  GC -queue.  Through  the  garbage  collection,  nodes  in  the  queue  are 
appended  to  the  free  list  and  are  then  ready  to  be  reused. 


2.2  Concurrent  Processes  on  the  Tree 

We  wish  to  perform  processes  of  the  following  types  concurrently  on  the  tree  structure: 

-  Insertion  is  the  processes  of  adding  a  value  to  the  tree,  if  the  value  is  not 
already  in  the  tree. 

-  Deletion  is  the  process  of  removing  an  existing  value  from  the  tree. 

-  Rotation  is  the  process  of  "rotating"  a  (sub)tree  so  that  the  heights  of  its 
subtrees  can  be  adjusted.  Rotation  is  typically  used  for  balancing  a  tree  (see, 
for  example,  [10],  p.  454).  In  this  paper  rotation  is  also  used  for  performing 
deletion  (see  Section  5). 

-  Searching  is  the  process  of  looking  for  a  node  with  a  given  value  v  in  the  tree. 

If  a  node  with  value  v  exists,  then  the  search  is  successful,  otherwise  it  is 
unsuccessful.  Searching  does  not  modify  the  tree,  and  is  often  used  by  other 
processes.  por  example,  if  we  wish  to  delete  a  value  from  the  tree,  then  we  must 
first  search  for  that  value  in  the  tree,  since  if  it  is  not  present,  we  cannot  delete 
it. 

-  Garbage  collection  is  the  process  of  appending  garoage  nodes  to  the  free  list  so 
that  they  can  be  reused. 
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For  correctness  reasons  we  allow  a  process  to  lock  one  or  several  nodes  against  modification 
by  another  process.  But,  for  achieving  a  high  degree  of  concurrency  we  require  that  the 
number  of  nodes  locked  by  any  process  at  any  one  time  be  bounded  by  a  small  constant.  In 
addition,  we  try  to  delay  searches  as  little  as  possible,  since  in  general  searching  is  done  far 
more  often  than  modifying.  The  operation  of  locking  (respectively,  unlocking)  a  node  n  is 
denoted  by  "lock(n)"  ("unlock(n)"). 


2.3  Definitions  for  Correctness 


We  say  that  a  concurrent  system  for  manipulating  a  binary  search  tree  is  correct  if  the 
system  possesses  the  following  properties: 

PI.  The  tree  is  always  consistent.  At  any  time,  if  we  freeze  the  current  tree,  then  an 
inorder  traversal  (see,  [10],  p.  316)  of  the  tree  generates  the  nodes  with  values 
in  sorted  order. 

P2.  The  termination  position  of  a.  search  is  always  consistent.  The  termination 
position  of  a  (successful  or  unsuccessful)  search  is  defined  to  be  the  last  node 
whose  value  is  examined  by  the  search  before  it  is  terminated.  Consider  a 
search  for  value  v.  Suppose  that  it  terminates  at  node  n  at  time  t.  We  require 
that  at  the  instant  t  if  we  freeze  the  tree  and  start  a  new  search  for  the  same 
value  v  from  the  root  then  n  must  be  the  termination  position  of  the  new  search. 

P3.  There  is  no  deadlock. 

PA.  An  intended  update  is  always  carried  out.  An  insertion,  deletion  or  rotation 
process  will  indeed  insert,  delete  or  rotate  as  intended,  before  it  terminates. 

P5.  A  value  v  can  be  added  to  or  deleted  from  the  tree  only  by  the  search  or  deletion 
process,  respectively.  (These  processes  are  defined  later.)  In  particular,  only 
nodes  which  are  no  longer  reachable  by  any  existing  or  future  search  will  be 
garbage  collected,  and  all  such  nodes  will  be  garbage  collected. 

Property  PI  is  clearly  needed  for  maintaining  the  binary  search  tree.  The  necessity  of 
properties  P3,  PA  and  P5  is  also  obvious.  For  property  P2,  we  note  that  if  it  is  not  satisfied 
then  two  searches  for  the  same  value  may  conclude  differently  on  the  same  tree.  Property 
P2  is  important  to  insertion  and  deletion  processes,  since  searching  is  performed  in  those 
processes.  In  fact,  in  Appendix  III  we  shall  show  that  properties  PI  to  P5  are  sufficient 
conditions  for  a  natural  correctness  criterion  for  concurrent  search  systems. 


2.4  Basic  Assumptions 

We  shall  prove  correctness  of  a  system  based  on  the  following  assumptions: 
Al.  The  tree  is  consistent  initially,  before  any  process  has  acted  on  it. 
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A2.  The  search,  insertion,  and  rotation  pro<rrses,  which  are  defined  later,  are 
"correct"  when  executed  alone,  in  the  sense  that,  starting  from  a  consistent  tree, 
the  search  process  will  find  a  value  if  and  only  if  the  value  is  in  the  tree  and  the 
insertion  and  rotation  processes  preserve  the  consistency  of  the  tree. 

A3.  Each  process  can  read  or  write  on  individual  fields  of  a  node  as  an  indivisible 
step. 

A4.  If  process  A  attempts  to  lock  a  node  which  is  already  locked  by  process  B,  then 
A  must  wait  for  E3  to  unlock  the  node.  In  this  case,  we  say  that  process  A  is 
blocked  (by  process  B)  at  the  node. 

A5.  The  procedures  create  and  append,  defined  in  Sections  3  and  6,  for  manipulating 
the  free  list  are  correct  in  the  sense  that  they  will  preserve  the  properties  of 
the  free  list  (cf.  F1-F4  in  Section  2.1). 

Notice  that  to  have  processes  satisfying  A2  and  A5  is  quite  standard.  So  for  clarity  in  this 
paper  we  chose  to  assume  A2  and  A5  rather  than  to  prove  them. 


2.5  Database  Record  Considerations 

This  paper  is  not  primarily  concerned  with  the  problem  of  updating  records  associated  with 
the  keys  in  a  database;  rather,  we  focus  on  the  problems  of  concurrent  reorganization  of  the 
part  of  the  database  containing  the  key  structure.  Here  we  will  digress  briefly  to  suggest 
one  possible  method  for  associating  records  with  the  keys  in  the  tree. 

To  each  node,  we  would  add  an  additional  field  (which  is  ignored  in  the  remainder  of  this 
paper):  the  record  field.  This  field  contains  a  pointer  to  the  record  associated  with  the  key 
stored  in  that  node.  A  specific  implementation  may  decide  to  put  this  record  on  the  disk  or  in 
main  memory.  Regardless,  the  pointer  in  the  node  points  to  some  large  chunk  of  data  that 
constitutes  the  associated  record.  For  each  individual  record,  we  would  view  that  record  as  a 
distinct  database.  To  change  information  in  this  node,  we  might  lock  the  whole  record  (as 
distinct  from  locking  the  node  itself).  Alternatively,  since  we  view  the  record  as  a  database, 
we  could  maintain  its  consistency  as  we  would  in  a  general  database. 
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3.  A  Search-Insertion  System 

In  this  section  we  describe  a  system  which  can  support  any  number  of  concurrent  searches 
and  insertions  on  a  binary  search  tree,  and  prove  the  correctness  of  this  system.  The 
procedures  and  correctness  proof  methods  presented  in  this  section  will  form  the  basis  for 
constructing  and  proving  more  complex  systems  considered  in  later  sections  of  the  paper. 

3.1  An  Example 

We  want  to  demonstrate  that  a  concurrent  system  consisting  of  the  usual  sequential 
searching  and  insertion  processes  without  modifications  is  incorrect.  Consider  Example  3.1  on 
a  simple  tree  with  R00T.left=a  and  a.value=l,  as  depicted  in  figure  3-1.  In  the  example, 
variables  s  and  r  are  local  to  processes  search(2)  and  insert(2),  respectively. 


Figure  3-1:  A  simple  tree. 
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search(2) 

insert<2) 

<previous  steps  --  start  at  ROOT> 

1. 

s_a 

2. 

r_a 

3. 

2>s.value(  =  l) 

4. 

s_s.right 

5. 

s  is  X 

6. 

2>r.value(=l) 

7. 

r_r  .right 

8. 

r  is  X 

9. 

insert  a  node  with 
value  2  as  the  right 
son  of  node  a 

10. 

Value  2  does  not  exist! 

Example  3.1 

Note  that  at  step  10  the  search  incorrectly  concludes  that  the  value  2  does  not  exist  in  the 
tree.  Equivalently,  property  P2  is  not  satisfied  at  the  time  when  the  search  terminates.  The 
problem  can  be  solved  by  introducing  some  locking  scheme  into  the  sequential  processes. 
This  modification  is  described  below. 

3.2  The  System 
The  Search  Process 

Search :  This  procedure  searches  for  a  node  in  the  tree  with  a  given  value,  v. 

procedure  search(v) 

(f,dir)_find<ROOT,v); 

s_f.dir; 

if  $j<X  then  print  "Value  v  is  at  node  s" 
else  print  "Value  v  is  not  in  the  tree"  fi\ 
unlock(f); 

The  procedure  find (n,u)  is  defined  below.  It  consists  of  the  usual  descent  through  a  tree 
and  is  expressed  recursively  for  clarity.  It  is  readily  seen  that  the  termination  position  of 
search(v)  (intuitively,  the  node  for  which  we  are  looking)  is  f.dir  if  the  search  is  successful 
and  is  f  if  the  search  is  unsuccessful.  The  procedure  find  is  an  auxiliary  procedure  that  is 
used  by  several  processes  considered  in  this  paper. 

Find:  The  following  procedure  searches  for  a  node  with  value  v,  starting  from  node  n,  with 
n.value^v.  (Recall  that  we  assumed  that  ROOT.value  «■  "infinity."  Hence  find(R00T,v)  is 
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always  well-defined  since  ROOT.value  >  v  for  all  i/.)  The  procedure  returns  a  pair  (f,dir), 
where  f  is  a  node  and  dir  is  a  direction  (left  or  right).  At  the  time  when  the  procedure 
returns  (f,dir),  node  f  is  locked,  and  has  the  property  that  if  value  v  exists  in  the  tree 
then  f  points  through  the  pointer  dir  to  a  node  whose  value  is  v  (i.e.  [f.dir].value-v), 
otherwise  f.dir»X. 

procedure  find(n,i/) 
f  _n; 

if  v  <  f. value  then  dir  _  left  else  dir  _  right  fi.; 

s _ f.dir;  /*Choose  correct  son*/ 

if  s»<\  and  s.value^v  then 

return  find(s,i/)  /*Recurse*/ 

else 

lock(f); 

if  s^ f.dir  then  /*lt  slipped  away  (see  note  below)*/ 

unlock(f); 

return  find(f,v)  /*So  recurse*/ 

else  retur/t(f,dir)  fi  /*Found  it*/ 


Note  that  after  the  lock(f)  operation  the  process  makes  sure  that  f  is  still  the  father  of  s 
(i.e.,  s«f.dir).  This  is  necessary,  since  another  concurrent  insertion  process  might  have 
changed  f.dir  and  unlocked  f  between  the  time  that  find  decided  that  "s-\  or  s.value-v"  and 
the  time  that  find  tried  to  fock(f).  In  this  case,  find  must  resume  the  search  at  node  f  again. 


The  Insertion  Process 

Insertion'.  This  procedure  inserts  a  node  with  value  v  into  the  tree  (at  one  of  the  leaves),  if 
no  such  node  already  exists  in  the  tree. 

procedure  insert(v) 

(f,dir)_find(ROOT^/>, 

*/f.dir*<X  then 

print  "Value  v  is  already  in  the  tree"; 
unlock(f) 

else 

create(w);  /*8uild  a  node*/ 

w.left_X; 

w.right_\; 

w.value_vj 

f.dir_w;  /*Point  to  it*/ 

unlock(f) 


The  procedure  create(w),  which  is  a  standard  free  list  manipulation  procedure  (with 
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synchronization),  is  defined  as  follows: 

Create:  This  procedure  creates  a  new  node  named  w,  by  removing  it  from  the  free  list. 

procedure  create(w) 
lock(FREE); 

if  FR£E.left=FR£E.right  then 

abort  the  process  which  calls  create  and  inform 
the  system  that  the  free  list  is  empty 

else 

w_FRE£.ieft; 

FREE.teftJFREE.Ieft].right 

fi 

unlock(FREE); 

At  this  point  the  reader  is  advised  to  convince  himself  that  the  locking  scheme  used  in 
procedure  find  indeed  solves  the  problem  demonstrated  by  Example  3.1.  It  is  also  instructive 
to  note  that  a  search  process  is  never  blocked  by  other  processes,  except  possibly  at  the 
time  right  before  it  terminates.  This  property  holds  for  all  the  systems  considered  in  the 
paper. 


3.3  Property  P2*  —  A  Property  for  Proving  P2 

It  is  difficult  to  prove  property  P2  defined  in  Section  2.3  by  induction  on  time,  since  it  is 
only  meaningful  as  a  property  at  the  termination  time  of  a  search.  Here  we  define  another 
property,  property  P2\  which  implies  property  P2  but  is  more  convenient  to  use  for  the 
(inductive)  correctness  proofs  in  later  sections  of  the  paper. 

Consider  a  search  for  value  v  (denoted  by  search(v)).  Suppose  that  the  search  starts  and 
terminates  at  time  Iq  and  tj,  respectively.  For  any  t,  t<[t0.t  ^ 3.  we  define  TP(t)  and  TProo((t) 
as  follows.  (TP(t)  and  TPr00j(t)  denote  termination  positions.)  Suppose  that  at  time  t  we 
"freeze"  the  current  tree  (i.e.  its  structure)  in  the  following  sense:  after  time  t,  no  process  is 
allowed  to  make  any  change  on  any  pointer  field,  but  each  process  must  proceed  to  the  point 
where  it  must  make  a  pointer  change,  or  it  is  blocked  by  another  process.  As  it  so  proceeds, 
it  locks  and  releases  the  same  locks  as  it  would  ordinarily.  The  important  point  here  is  that 
the  structure  of  the  tree  is  not  changed,  but  all  processes  proceed  as  far  as  they  can  to 
avoid  impeding  a  search  through  the  tree. 

Now  consider  the  continuation  of  search(y)  on  the  tree  frozen  at  time  t,  with  the  search 
starting  from  wherever  it  was  at  time  t.  Then  with  respect  to  the  tree  frozen  at  time  t, 
search(tz)  may  or  may  not  terminate,  depending  upon  whether  or  not  the  node  it  has  to  lock  is 
already  locked  by  another  process.  We  define  TP(t)  to  be  the  termination  position  of 
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searchft/)  if  it  terminates;  otherwise  TP(t)  is  undefined.  Similarly,  with  respect  to  the  same 
tree  frozen  at  time  t,  we  define  TPr00j<t)  to  be  the  termination  position  of  a  new  search  that 
starts  searching  from  the  root  of  the  tree  for  the  same  value  v.  TProoj(t)  is  undefined  if  the 
new  search  does  not  terminate. 

Property  P2’  is  stated  as  follows: 

P2\  For  any  search  which  starts  at  tg  and  terminates  at  tj, 

TP<*>-TProot<‘> 

for  any  t([tg,t  j]  for  which  TP(t)  is  well-defined. 

In  this  new  terminology,  property  P2  can  be  expressed  as: 

P2.  F or  any  search, 

TP<»l>-TProot<»l ) 

where  t  ^  is  the  termination  time  of  the  search. 

It  is  seen  that  P2’  implies  P2,  since  by  the  definition  of  tj,  TP(tj)  is  well-defined. 

3.4  The  Correctness  Proof  of  the  System 

We  only  need  be  concerned  with  properties  PI  and  P2’;  it  is  trivial  that  the  system 
satisfies  properties  P3,  P4  and  P5.  By  assumption  Al,  properties  PI  and  P2’  hold  initially. 
We  assume  (inductively)  that  properties  PI  and  P2’  hold  up  to  time  t,  when  a  change  to  the 
tree  structure,  f.dir_w,  is  made.  For  proof  purposes  we  may  assume  that  no  two  operations 
are  done  at  exactly  the  same  time.  Hence  we  may  choose  <>0  so  that  in  the  interval  [t-E,t+E] 
the  change  at  time  t  is  the  only  operation  done  by  any  process.  Let  T"  and  T*  be  the  tree 
frozen  at  times  t-<  and  t+<,  respectively.  Note  that  T*  is  the  tree  resulting  from  f~  by  adding 
w  as  the  "dir"  son  of  f.*  This  is  illustrated  in  Fig.  3-2,  assuming  "dir"  equal  to  "right." 

We  wish  to  prove  the  following  two  assertions  (a)  and  (b). 

a.  Property  PI  holds  at  time  t+<.  Consider  the  insertion  responsible  for  the  change 
at  time  t.  Note  that  the  insertion  process  is  simply  a  search  followed  by  a 
pointer  change.  Since  the  search  satisfies  P2’  at  time  M,  one  can  view  that  the 
change  at  time  t  is  done  by  the  insertion  executed  alone  on  tree  T“.  Therefore 
by  assumption  A2  property  PI  holds  at  time  t+c. 

b.  Property  P2’  holds  at  time  U(.  Consider  a  search  process,  say,  search(i/),  which 


In  this  paper,  wa  uaa  the  notation  "  1  dir  ’  to  ra  far  to  the  node  pointed  to  by  node  f  in  IK#  direction  *  pacified  by  * 
dir  "  (which  ie  ueuatly  a  variable),  or  to  refer  to  a  pointer  to  that  node.  The  notation  *  dir*  *  refere  to  the  direction 
complementary  to  "  dir  ".  Hence  if  dir— left  then  dir’.rijht,  and  vice  vers*. 
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Figure  3-2:  Trees  T"  and  T*  describing  the  status  of 
the  system  at  times  t-<  and  t»f,  respectively. 

starts  before  time  t-<  and  terminates  after  t+(  (i.e.  it  is  in  progress  when  the 
pointer  is  changed  at  time  t).  Oy  the  inductive  assumption  that  P2’  holds  up  to 
time  t,  we  know  that  the  assertion  is  true  for  tree  T“.  For  the  purpose  of 
proving  P2’  at  time  t+c,  we  can  assume  that  TP(t+0  is  well-defined.  In  the 
following,  we  wish  to  prove  that  on  T+  the  termination  position  TP(t+<)  of 
searchfi/)  coincides  with  that,  TProoj(t+0,  of  another  search  process,  namely, 
find(ROOT». 

Case  i.  TP(t-<)  is  well-defined.  Then  TP(t-()  (=TPr00j(t-<»  roust  not  be 
node  f,  since  f  is  locked  at  time  t-f  by  the  insertion  process  responsible 
for  the  change  f.dir_w,  and  TP(t*<W  would  make  TP(t+<)  undefined.  This 
implies  that  TP(t)  and  TPr00|(t>  are  constants  over  [t-<,t+€],  since  the 
change  f.dir_w  has  no  effect  on  them.  (We  rely,  of  course,  on  assumption 
A2  of  the  correctness  of  the  pointer  change  involved.)  Therefore  TP(t+()  « 
TProot<t+<>. 

Case  2-  TP(t-<)  is  not  well-defined.  Since  TP(t+0  is  well-defined,  in 
defining  TP(t-<)  the  continuation  of  the  search,  search(v),  on  T“  must  be 
blocked  at  node  f.  There  are  two  cases,  depending  on  the  state  of  the 
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process  search(v)  at  time  t-<: 

i.  The  process  searchfj/)  has  not  yet  examined  (.dir  at  time  t-«.  Then 
on  T+  search(v)  will  correctly  reach  either  f.dir’  or  f.dir  as 
find(ROOT,j/)  does,  since  the  search  uses  the  updated  f.dir. 

ii.  The  process  search(v)  has  read  f.dir  as  X.  at  time  t-c.  Then  on  T+, 
search(v)  will  find  f.dir  (»w)  A  and  thus  start  searching  from  f.  This 
implies  that  searchfj/)  will  again  correctly  reach  w  as  find(ROOT,j/) 
does. 


We  have  shown  that  the  pointer  change  done  by  an  insertion  process  preserves  properties 
PI  and  P2\  Therefore,  by  induction,  PI  and  P2’  always  hold. 


3.5  Comments  on  Locking 

Notice  that  the  find  procedure  locks  the  father  of  the  node  whose  key  has  the  value  for 
which  find  is  searching.  (Consequently,  the  search  procedure  also  locks  that  node.)  However, 
for  purposes  of  simply  reporting  the  ninstantaneou$M  existence  of  a  key  in  the  database  the 
find  and  search  procedures  can  be  modified  by  deleting  the  lock/unlock  calls  to  provide  the 
ability  to  search  without  locking.  While  we  have  omitted  those  versions  of  find  and  search 
from  the  present  paper  for  purposes  of  clarity,  we  would  certainly  include  such  procedures 
in  a  full  system,  where  simultaneous  examination  of  nodes  by  several  processes  was  likely  to 
occur. 

Notice,  however,  that  locking  the  associated  record  is  often  necessary.  For  example, 
locking  would  be  used  in  the  case  that  some  modification  will  be  made  to  the  associated 
record,  once  the  key  is  found  (see  Section  2.5).  In  this  case,  we  would  lock  the  record  (or 
possibly  some  segment  of  the  record)  to  prevent  change  by  another  process. 

Similarly,  record  locking  may  be  necessary  during  prolonged  examination  of  a  node  and  its 
record.  For  example,  if  we  wish  to  guarantee  that  a  key  continues  to  exist  while  we  examine 
its  record,  then  we  must  lock  the  node  containing  that  key  to  prevent  deletion  by  another 
process.  Again,  we  might  instead  wish  to  lock  some  segment  of  the  record  to  prevent 
modification  to  that  segment. 
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4.  A  Search-Insertion-Rotation  System 

In  this  section  we  extend  the  system  of  Section  3  to  include  rotation  processes.  Important 
ideas  of  this  paper  such  as  the  use  of  back  pointers,  copying  and  blue  nodes  are  introduced 
in  this  section. 


4.1  An  Example 

The  following  example  illustrates  the  kind  of  problems  we  might  encounter  when  rotations 
are  executed  concurrently  with  search.  Consider  Example  4.1  for  rotating  and  searching  the 
tree  shown  in  Fig.  4-l(a). 


(a) 


(b) 


Figure  4-1:  Rotating  (b,c)  to  the  left:  oc,  ft  and  d  are  subtrees. 


search!  20) 

rotation 

1. 

s_a 

2. 

20>s.value{*5) 

3. 

s_s.right(»b) 

4. 

20>s.value(*10) 

5. 

a.right_c 

6. 

c.left_b 

7. 

b.righl  _/? 

8. 

s_s.right(*b.right-/I) 

Example  4.1 
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Note  that  at  step  8,  the  search  starts  to  search  subtree  ft  for  value  20  in  the  rotated  tree 
(i.e.  the  tree  in  Fig.  4-1  (b)>,  while  at  this  time  a  search  from  the  root  for  the  same  value  20 
(in  the  same  tree)  will  terminate  in  subtree  A.  Property  P2  is  therefore  violated.  Our 
solution  to  this  problem  is  to  first  establish  a  rotated  version  of  the  structure  in  a  copy 
outside  the  tree.  (In  particular,  we  create  copies  b’  and  c'  of  nodes  b  and  c  in  figure  4-2.) 
We  then  connect  the  copy  into  the  tree  by  changing  just  one  pointer  from  node  a,  which  is 
an  indivisible  operation.  The  nodes  in  the  old  structure  are  changed  to  blue  nodes  and 
inserted  into  GC-queue,  and  are  to  be  collected  by  garbage  collectors.  (The  garbage 
collection  process  will  be  described  in  Section  6.)  By  providing  "bach,  pointers,"  we  ensure 
that  those  search  processes  which  are  at  blue  nodes  can  still  come  out  to  reach  their 
"correct"  termination  positions.  The  result  of  rotating  the  tree  shown  in  Fig.  4-l(a)  using  this 
new  method  is  illustrated  in  Fig.  4-2. 


Figure  4-2:  Results  of  rotation  (of  the  tree  in  Fig.  4-1  (a)) 
using  the  idea  of  copying.  In  the  diagram,  blue  (or  garbage)  nodes 
are  indicated  by  doited  circles,  and  back  pointers  by  dashed  lines. 
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4.2  The  System 

The  rotation  process,  which  follows,  performs  a  rotation  by  building  a  copy  of  the  section 
of  the  tree  to  be  altered  and  then  replacing  the  old  section  with  the  modified  new  section.  In 
this  version  of  the  rotation  procedure,  we  include  some  code  that  will  only  be  useful  when 
used  (in  Section  5)  with  the  deletion  procedure.  For  example,  locking  the  new  nodes  (b*  and 
c’)  is  unnecessary  for  the  rotation  procedure  itself,  since  once  the  procedure  switches  from 
the  old  version  to  the  new  version  by  a  pointer  change,  it  no  longer  uses  b’  and  c\ 

Also  of  interest  is  the  use  of  a  "back  pointer,"  which  was  mentioned  earlier.  This  pointer 
has  no  meaning  for  ("white")  nodes  that  are  part  of  the  tree.  However,  for  "blue"  nodes,  the 
back  pointer  is  used  to  continue  the  search  when  the  father  of  the  node  for  which  find  is 
searching  has  been  deleted  (made  blue)  while  find  was  deciding  whether  the  node  was,  In 
fact,  the  father  of  the  desired  node. 


The  Rotation  Process 

Rotation:  Suppose  that  a,  a.dirl  and  fa.dirl1.dir2  are  three  consecutive  white  nodes  on  a 
path.  The  following  procedure  moves  a.dirl  away  from  the  path  by  performing  a 
rotation. *  It  is  assumed  that  a  and  a.dirl  are  locked  when  this  procedure  is  called.  The 
procedure  returns  (a.c’.b’)  where  c’  «  a.dirl  and  b’  *  c'.dir2\  with  c\  b’  locked. 

procedure  rotation(a,dirl,dir2) 

b_a.dirl; 

c_b.dir2; 

create(b’);  create(c’);  /*Set  up  new  nodes#/ 

lock(c’);  lock(b’);  lock(c); 
c\dir2_c.dir2;  c\dir2*_b’;  c’.value_c.value; 
b\dir2_c.dir2’;  b’.di r 2’ _ b.di r 2’;  b\value_b.va!ue; 

a. dirl  _c’;  /#Change  the  tree#/ 

b. back_a;  /*Back  pointers*/ 

color  b  blue;  /*And  blue  nodes*/ 

c. back_c"; 
color  c  blue; 

enqueue  nodes  b  and  c  in  GC-queue; 

/#For  garbage  collection#/ 
unlock(a);  unlock(b);  unlock(c);  /*Unlock  a,  b,  c*/ 

return  (a,c\b') 


In  this  p*p«r  we  do  no)  wish  to  restrict  our  rstull  to  *ny  specific  typo  of  balanced  troo  such  ss  tho  AVI  tree 
Thoroforo,  for  tho  balancing  purpose,  schemes  of  deciding  where  rotations  should  take  placo  will  not  bo  spocifiod. 
Rocont  schomes  by  Guibos  *nd  Sedgewick(9]  on  detecting  rotations  to  bo  performed  booed  on  local  information  teem  to 
be  perticuterly  suitable  to  our  systems. 
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The  search  and  insertion  processes  are  the  same  as  those  defined  in  Section  3.  But  the 
procedure  find(n,j/>  must  be  redefined  (as  follows)  to  handle  the  presence  of  blue  nodes.  In 
particular,  it  must  consider  the  possibility  that  f  became  blue  between  the  decision  to  lock  it 
and  the  actual  locking  of  the  node. 

Find:  The  modified  find  procedure.  This  version  of  find  uses  deleted  ("blue")  nodes  and  back 
pointers  in  order  to  continue  searching  from  a  deleted  node. 

procedure  find(n,v) 
f_nj 

if  v  <  f.value  then  dir  _  left  else  dir  _  right  ft-, 
s_f.dir;  /*Find  son*/ 
if  s><\  and  s.value  +v  then 

return  find<s,v)  /*Next  level*/ 

else 

lock(f); 

if  f  is  blue  then  /*Just  missed  getting  node*/ 

unlock(f); 

return  find(f.back,v) 

/♦Follow  back  pointer  from  blue  node*/ 

else 

if  s/f.dir  then 

unlock(f);  /*Some  process  changed  it*/ 

return  find(f,v) 

else  return(f.dir)  /*Found  it*/ 

fi 

fi 

fi 


4.3  The  Correctness  Proof  of  the  System 

We  only  need  be  concerned  with  properties  PI,  P 2'  and  P3;  it  is  trivial  that  the  system 
satisfies  properties  P4  and  P5.  Since  a  rotation  process  always  locks  nodes  on  a  path  in 
top-to-bottom  order,  there  is  no  danger  of  deadlock.  Hence  property  P3  is  satisfied.  We 
now  prove  that  properties  PI  and  P2’  hold.  This  proof  uses  the  framework  and  terminologies 
established  in  section  3.4. 

By  assumption  Al,  PI  and  P2’  hold  initially.  We  also  assume  (inductively)  that  they  hold  up 
to  time  t,  when  a  change  performed  by  a  rotation  process  is  made  to  the  tree  structure.  We 
need  not  be  concerned  with  changes  due  to  insertions,  since,  by  the  results  of  Section  3,  we 
know  that  insertions  will  preserve  properties  PI  and  P2’.  Define  (,  T~  and  T+  as  in  Section 
3.4. 

a.  Property  PI  holds  at  time  t+«.  Note  that  the  rotation  process  locks  all  the  nodes 
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it  reads  and  writes.  Hence  the  proof  follows  directly  from  assumption  A2. 

b.  Property  P2'  holds  at  time  t*(.  As  before,  consider  a  search  process,  say, 
search(v),  which  starts  before  time  t-<  and  terminates  after  t+<.  By  the  inductive 
assumption  that  P2’  holds  up  to  time  t,  we  know  that  the  assertion  is  true  for 
tree  T".  Again,  we  assume  that  TP(t+<)  is  well-defined  and  want  to  prove  that 
on  T+  the  termination  position,  TP(f+0,  of  search(ir)  coincides  with  the 
termination  position,  TPr00j(t+<),  of  find(root,i/). 

Case  ]_!  The  change  at  time  t  is  a.dirl_c’  or  b.back_a  (cf.  Fig.  4.3). 


Figure  4-3:  The  new  tree  formed  after  the  operation  a.dir tjc\ 

i.  TP(t-c)  is  well-defined.  Then  TP(t-<)  (=TProoj(t-0)  must  not  be  a,  b, 
or  c,  since  they  are  all  locked  at  time  t-c.  This  implies  that  TP(t)  and 
TPr00t<t)  are  constant  over  [tj-c ,t j ♦<],  since  the  change  at  time  t  has 
no  effect  on  them.  That  is,  search(v)  or  find(root,v)  will  terminate  at 
the  same  node  on  either  T"  or  T+.  Therefore,  TP(t+<)  »  TProo((t+<). 

ii.  TP(t-<)  is  not  well-defined.  Then  searchtv)  on  T"  must  be  blocked  at 
some  node.  It  is  still  blocked  on  T+,  since  no  lock  will  be  released 
after  the  change  at  time  t  and  before  the  next  pointer  change.  This 
contradicts  the  assumption  that  TP(t+()  is  well-defined. 

Case  2;  The  change  at  time  t  is  c.back_c\ 

i.  TP(t-c)  is  well-defined.  The  proof  that  TP(t+()  ■  TProoj(t+<)  is  the 
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same  as  that  in  part  (i)  of  case  1. 

ii.  TP(M)  is  not  well-defined.  Then  we  can  conclude  that  search(tr)  on 
T"  must  be  blocked  at  a,  b  or  c,  since:  (1)  these  are  the  only  nodes 
whose  locks  will  be  released  on  T+  by  the  action  at  time  t  and  (2) 
search(iz)  is  unblocked  at  time  t  (recall  that  TP(t+0  is  well-defined). 
Further,  on  T+  search(v)  will  always  come  out  from  the  garbage 
nodes  to  reach  correct  white  nodes.  The  procedure  find(root,v) 
does  this,  by  utilizing  the  back  pointers  to  resume  the  search 
through  the  tree. 
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5.  A  Search-lnsertion-Rotation-Deietion  System 

In  this  section,  we  further  extend  our  system  to  include  concurrent  deletion  processes. 
Unlike  other  operations  considered  so  far,  deletion  is  not  a  "local"  operation  in  the  sense  that 
it  may  have  to  make  changes  in  two  sections  of  the  tree  that  are  arbitrarily  distant  from  each 
other.  That  is,  the  node  to  be  deleted  and  the  node  with  which  it  is  to  be  replaced  can  be 
arbitrarily  far  apart.  This  makes  the  deletion  operation  difficult  to  deal  with  in  a  concurrent 
system  where  only  a  constant  number  of  nodes  may  be  locked  by  a  process  at  any  time.  In 
this  section,  we  shall  demonstrate  how  "nonlocal"  operations  such  as  the  deletion  operation 
can  still  be  correctly  incorporated  into  a  concurrent  system  using  only  "local"  locks.  This  is 
achieved  through  the  repeated  use  of  the  rotation  process  introduced  in  section  4. 


5.1  An  Example 

Example  5.1  and  Figure  5-1  illustrate  that  an  existing  searching  process  may  become 
incorrect  when  another  deletion  process  is  executing  concurrently. 


search(15) 

<previous  steps> 

delete(20) 

1. 

15  >  f. value  <«5) 

2. 

s _ f.right  (=a) 

3. 

15  <  s.value  (=20> 

4. 

s_s.!eft  (=b) 

5. 

<search{20h  obtain  node  a> 

6. 

<search  for  the  node  in  the 
left  subtree  of  a  which  has 
the  largest  value  (node  e, 
in  this  case)> 

7. 

replace  a  with  e 

8. 

15  >  s.value  (=>7) 

Example  5.1 

After  step  7,  the  searching  process  searches  for  value  15  in  the  left  subtree  of  node  e  (cf. 
the  tree  in  Fig.  5-l(b)).  Property  P2  is  not  satisfied  because  find(root,15)  would  search  the 
right  rather  than  the  left  subtree  of  node  e. 

In  general,  suppose  that  node  a  is  the  node  to  be  deleted  and  node  a’  is  the  node  with 
which  node  a  is  to  be  replaced.  Then  any  active  search  process  that  has  passed  node  a  while 
searching  for  a  value  between  a’.value  and  a.value  will  become  inconsistent  after  the  deletion. 
In  the  following  we  propose  a  method  for  dealing  with  this  problem. 
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(a) 


(b) 


Figure  5-1:  Deletion  of  node  a  with  value  20. 


5.2  The  System 

The  search,  insertion  and  rotation  processes  are  the  same  as  those  defined  in  Section  4. 
The  deletion  process  is  described  as  follows. 

Note  that  if  at  least  one  son  of  the  node  to  be  deleted  is  \  (which  should  occur  with  more 
than  .5  probability),  then  the  deletion  is  very  simple.  This  is  illustrated  in  Fig.  5-2.  The 
procedure  remove  defined  below  performs  this  simple  deletion.  Briefly,  the  procedure  works 
by  changing  the  pointer  from  the  father  (a)  of  the  node  (b)  to  be  deleted  to  point  "around" 
that  node.  (This  only  works  when  b  has  only  one  son,  since  node  a  cannot  use  the  same 
pointer  to  point  to  two  nodes  simultaneously.)  This  operation  removes  b  from  the  tree.  A 
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back,  pointer  is  provided  (which  points  from  b  to  a)  for  any  process  that  was  searching  at  b 
when  a.dir  was  changed,  so  that  the  search  can  continue  correctly. 


(a)  (b) 


Figure  5-2:  The  simple  deletion  case. 


The  Deletion  Process 

Remove:  This  procedure  removes  a  node  (a.dir  1 )  when  it  is  known  that  one  son  of  that  node 
((a.dir  l).dir2’)  is  Nodes  a  and  a.dirl  are  locked  when  the  procedure  is  called,  and  are 
unlocked  when  the  procedure  ends. 

procedure  remove(a,dirl,dir2) 

b_a.dirl; 

c_b.dir2; 

a. dir  l_c;  /*Point  around  b*/ 

b. dir2’_c;  /*Redirect  search  from  b:  b.left*«b.right*c*/ 

b.back_a;  f*Pr ovide  back  pointer*/ 

color  b  blue;  /*And  blue  node*/ 

enqueue  node  b  in  GC-queue  /*F or  garbage  collection*/ 
unlock(a); 

unlock(b); 

The  deletion  process  described  below  is  formulated  as  two  steps:  (1)  find  the  correct  node 
and  (2)  delete  it  (handled  by  the  procedure  dclet ion-by-rotation). 

Delete:  This  procedure  deletes  a  node  with  value  v  from  the  tree,  if  such  a  node  exists. 
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procedure  deleted) 

(f,dir)_fincKroot,«/); 
if  f.dir  =  X.  then 

print  "value  v  is  not  in  the  tree"; 

unlock(f); 

else 

s _ f.dir; 

lock(s); 

deletion-by-rotation(f,dir)  /#Do  the  dirty  work*/ 

fi 

The  procedure  deletion-by-rotation(f.dir)  is  defined  below.  Since  simple  deletion  by  the 
remove  procedure  is  sometimes  not  possible,  this  procedure  moves  the  node  to  be  deleted 
down  in  the  tree  to  a  piace  where  that  action  is  possible.  It  does  this  by  repeatedly  rotating 
the  node  to  a  lower  position  in  the  tree  (using  the  rotation  procedure  defined  in  Section  4), 
until  it  is  possible  to  call  remove  to  actually  delete  the  node.  After  this  has  been 
accomplished,  the  procedure  works  its  way  back  up  the  tree  in  an  attempt  to  rebalance  the 
tree.  In  particular,  the  procedure  moves  the  node  down  the  tree  by  recursive  calls  on  itself. 
After  deletion,  it  rebalances  by  going  back  up  the  tree  (again  using  rotation),  after  each 
recursive  call  returns. 

In  the  version  of  the  dele:<.on-by-rotation  procedure  that  we  give  here,  all  operations  are 
biased  in  one  direction  for  purposes  of  clarity  and  simplification  of  the  algorithm.  This 
directional  bias  is  not  necessarily  unreasonable  if  the  deletion  starts  on  a  balanced  tree  or  if 
the  information  about  the  structure  of  the  tree  is  not  available.  If  one  were  striving  for 
efficiency,  one  could  add  additional  code  to  optimize  the  direction  in  which  rotations  and 
removals  were  to  be  done,  using  information  about  the  structure  of  the  tree. 

Note  that  in  the  call  to  rotation,  in  the  returned  triple  (f,g,h),  h  is  the  new  copy  of  the  node 
to  be  deleted,  and  f  is  identical  to  the  procedure  parameter  f. 

Deletion  by  Rotation:  The  following  procedure  deletes  node  f.dir  by  (recursively)  performing 
a  sequence  of  rotations  that  serve  to  move  f.dir  to  a  position  lower  in  the  tree  where  it 
can  be  removed  by  the  procedure  remove  given  above  (ending  the  recursion).  Nodes  f 
and  f.dir  are  locked  when  the  procedure  is  called.  The  procedure  also  rebalances  the 
tree  after  deletion  when  such  rebalancing  is  still  possible.  The  procedure  ends  with  no 
locks  set. 

procedure  deletion-by-rot ation(f.dir) 
s_f.dir; 

if  s.left  -  X.  then  removed, dir, right)  /*End  recursion*/ 

/♦(Note:  Example  of  directional  bias)*/ 


else 
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(f,g.h)_rotation(f, dir, left);  /*Move  f.dir  down*/ 

If  h.left  =  \  then 

/♦Don’t  need  to  rebalance  on  lar-t  recurive  call*/ 
deletion-by-rotation(g,  right); 
else  /*Do  recursion  and  rebalance*/ 

deletion-by-rotation(g, right);  '/'Recursive  call*/ 

/♦N.B.:  at  this  point,  no  nodes  are  locked*/ 
lock(f);  /*Bcgin  rebalance*/ 

if  g  /  f.dir  or  f  is  blue  then 

/♦Can't  rebalance,  since  things  have  changed*/ 
unlock(f) 

else 

lock(g); 

(ftg\h’)_rotation(f,  dir,  right); 

untock(g’); 

untock(h’) 

A 

A 

A 

It  is  relatively  easy  to  check  that  the  inclusion  of  the  procedure  remove  preserves  the 
correctness  of  the  system  in  Section  4.  If  the  system  in  the  current  section  is  deadlock-free, 
then  we  can  conclude  that  it  is  correct,  since  it  is  built  from  the  procedure  remove  and  the 
system  in  Section  4.  To  show  that  the  system  is  deadlock-free,  we  note  that  at  any  level  of 
recursion,  when  the  deletion-by-rotation  call  returns,  no  nodes  are  locked.  For  each  level  at 
which  rebalancing  is  attempted,  therefore,  only  new  locks  are  used.  Furthermore,  they  are 
applied  using  the  top-down  discipline.  Thus,  deadlocks  cannot  occur  in  the  system. 

An  alternative  solution  to  the  problem  concerning  deletions  would  be  to  simply  leave 
locked  all  nodes  locked  by  the  deleter,  and  then  unlock  them  "on  the  way  back  up,"  after 
rebalancing.  However,  this  would  violate  our  constraint  of  never  locking  more  than  a  constant 
number  of  nodes  at  one  time. 


...  ■.lAfiH 
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6.  Garbage  Collection 

In  this  section  we  consider  the  problem  of  correctly  appending  garbage  nodes  to  the  free 
list. 


6.1  An  Example 

The  following  example  illustrates  that  for  garbage  collection  one  cannot  simply  append  blue 
nodes  to  the  free  list.  Refer  to  figure  5-2. 


search(20) 

delete!  10) 

1. 

r_a 

2. 

20>r.value(*5) 

3. 

r_r.right(-b) 

4. 

<delete(b)> 

5. 

garbage-collectib) 

6. 

compare  r.value(«b.value) 
with  20 

Example  6.1 

Note  that  the  comparison  in  step  6  is  erroneous,  since  node  b  no  longer  exists-  in  the  tree. 
It  should  have  been  left  (blue)  and  not  garbage  collected  so  the  search  could  have  recovered 
from  the  deletion.  This  is  why  —  in  the  procedures  given  above  —  we  only  enqueue  blue 
nodes  to  be  garbage  collected.  The  garbage  collector  must  be  careful  not  to  collect  a  node  to 
which  another  process  might  still  have  access. 


6.2  Remarks 


Rules  for  a  garbage  collector  are  simply  that  it  not  collect  garbage  too  soon,  but  that  it 
also  doesn’t  have  to  waif  "too  long."  These  can  be  stated  more  formally  as: 

1.  Let  f  be  a  node  that  is  detached  from  the  tree.  If  f  is  referenced  —  or  can  6* 
referenced  —  by  any  process,  then  f  is  not  yet  garbage  ready  to  be  collected. 

2.  When  the  garbage  collector  prepares  to  collect  node  f,  it  only  has  to  wait  for 
that  particular  node  (not  for  later  copies  of  the  node)  to  no  longer  be 
referenceable. 
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6.3  A  Solution 

In  this  section  we  offer  a  simple  solution  with  a  single  garbage  collector.  In  appendix  I  we 
sketch  modifications  to  the  concurrent  tree  manipulation  processes  that  allow  a.  number  of 
garbage  collectors  to  operate  concurrently  with  the  tree  mutators. 

Perhaps  the  simplest  way  to  solve  the  problem  is  as  follows.  Periodically,  the  garbage 
collector  freezes  the  garbage  collection  queue  (GC-queue)  in  the  following  sense:  it  locks  the 
queue  (against  any  insertion  to  it  by  the  tree  mutators),  copies  it  (to,  say,  a  queue 
GC’-queue),  resets  the  original  queue  (GC-qucuc)  to  its  emply  state,  and  unlocks  it.  (Copying 
GC-queue  and  resetting  it  can  be  done  in  constant  time  for  arbitrarily  long  queues  if  the 
queue  is  stored  as  a  linked  list.)  Then,  it  wails  until  all  of  the  currently  running  processes 
have  terminated.  These  are  the  processes  that  started  running  before  GC-queue  was  locked 
by  the  garbage  collector,  i.e.  the  processes  which  might  access  the  garbage  (blue)  nodes  in 
GC’-queue.  (Such  a  wait  might  be  implemented,  for  example,  by  having  each  process  enter  in 
a  log  the  time  when  it  starts  and  terminates.  The  GC  process  would  then  wait  until  "OUT" 
entries  appeared  in  the  log  for  each  process  that  had  an  "IN"  entry,  but  no  "OUT"  entry,  at 
the  time  the  GC-queue  was  locked.)  After  this  wait,  the  garbage  collector  returns  each  of  the 
garbage  nodes  in  GC’-queue  to  the  free  list  by  using  the  append  procedure  defined  below. 

Append:  This  procedure  returns  a  node  to  the  free  list. 

procedure  append(n) 
n.color_white; 
n.right_\; 
lock(FREE); 

[FREE.right].right_n; 

FREE.right_n; 

unlock(FREE) 

With  this  solution,  blue  (or  garbage)  nodes  may  not  be  appended  to  the  free  list  for  some 
long  period  of  time  after  they  become  garbage.  This  is  undesirable  for  situations  where 
space  utilization  is  crucial.  Note,  however,  that,  because  white  nodes  never  point  to  blue 
nodes  in  our  systems,  the  existence  of  blue  nodes  has  no  effect  on  the  speeds  of  those 
searches  through  the  tree  which  started  after  these  nodes  had  become  garbage  nodes.  Also, 
the  execution  of  the  append  procedure  is  carried  out  in  parallel  with  other  processes.  Thus 
it  appears  that  this  simple  solution  is  quite  acceptable  as  far  as  search  speeds  are  concerned. 
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7.  Summary  and  Concluding  Remarks 


In  this  paper,  we  have  examined  some  of  the  details  of  a  particular  problem  in  concurrent 
database  manipulation.  To  the  authors'  knowledge,  many  of  the  properties  of  the  systems 
presented  are  not  achievable  on  the  basis  of  any  existing  general  theory  on  concurrency 
control.  For  example,  in  the  two-phase  locking  scheme  offered  by  Eswaran,  et.aL  [7],  a 
search  process  would  be  required  to  lock  all  nodes  in  the  search  path,  and  would  not  release 
any  of  these  locks  until  the  end  of  the  search.  The  special  structure  of  binary  search  trees 
enables  us  to  design  concurrent  systems  enjoying  a  high  degree  of  concurrency.  For  further 
discussion  and  results  on  how  special  information  about  a  problem  can  help  the  design  of 
efficient  concurrent  database  systems,  see  [11,  12]. 

We  summarize  some  of  the  important  contributions  emoodied  in  the  concurrent  binary 
search  tree  systems  presented  in  this  paper: 

-  The  algorithms  use  neither  reader  locks,  nor  exclusive  locks.  Only 

writer-exclusion  locks  are  used,  simply  to  prevent  the  obvious  problems 
engendered  by  simultaneous  update  of  a  node  by  more  than  one  process.  The 
locking  scheme  used  to  apply  these  locks  is  simple.  In  particular,  it  is 
implementable  without  the  overhead  incurred  by  a  queue  manager  or  a  system 
supervisor. 

-  The  size  of  the  region  of  the  tree  which  is  locked  by  a  process  at  any  time  is 
bounded  by  a  (small!)  constant. 

-  The  idea  of  copying  —  doing  large  amounts  of  work  outside  the  data  structure 
and  then  indivisibly  introducing  all  of  the  changes  simultaneously  --  is  a  useful 
technique  for  removing  some  of  the  inherent  complexity  of  concurrent 
operations. 

-  The  back  pointers  and  blue  nodes  are  a  specific  instance  of  the  idea  of  a  general 
mechanism  for  recovery  from  some  of  the  "confusion"  caused  by  concurrency. 

Such  a  mechanism  is  provided  for  use  by  processes  whose  earlier  actions  have 
become  invalid  as  a  result  of  the  actions  of  another  process. 

-  In  order  to  take  full  advantage  of  the  power  of  multiprocessing,  we  introduce  the 
idea  of  postponement.  This  is  embodied  by  the  rule:  "A  process  should  only  do 
what  it  has  to  do."  Often,  nothing  is  lost  by  allowing  a  second  process  to 
continue  the  work  begun  by  a  first  process.  In  fact,  waiting  time  may  sometimes 
be  avoided  by  postponing  work  {e.g.  collection  of  the  garbage  nodes  produced 
by  a  process  is  postponed  and  is  eventually  performed  by  a  garbage  collector 
process). 

-  We  present  a  fairly  rigorous  proof  of  the  correctness  of  our  concurent  systems. 

In  doing  so  we  demonstrate  that  such  correctness  can  be  proved,  and  we 
develop  techniques  for  use  in  these  proofs. 

-  Two  garbage  collection  mechanisms  are  offered.  These  are  auxilliary  to  the  main 
tree  system.  They  allow  us  to  further  exploit  the  concurrency  available  by  using 
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some  of  the  techniques  mentioned  above  (copying,  postponement,  etc.),  and 
decoupling  the  necessary  garbage  collection  from  the  main  tree  operations 
(insertion,  deletion,  reorganization). 

Binary  search  trees  represent  a  very  simple  structure  for  storing  data.  Further  work 
should  try  to  extend  some  of  the  ideas  presented  in  this  paper  to  more  general  database 
systems. 
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Concurrent  Garbage  Collection 


I.l.  The  Problem 

In  section  6  we  presented  a  simple  approach  to  the  garbage  collection  problem.  Here  we 
sketch  modifications  to  the  set  of  concurrent  processes  given  in  the  paper  which  will  allow  a 
set  of  concurrent  garbage  collectors  to  operate  correctly  in  parallel  with  the  tree  mutators. 
The  garbage  collectors  will  never  incorrectly  collect  a  blue  node  that  might  still  be  used  by 
some  process  (as  was  described  in  Example  6.1).  This  scheme  should  be  used  when  space 
utilization  is  important,  since  garbage  is  collected  and  returned  to  the  free  list  very  quickly 
(relative  to  the  batched  collection  suggested  in  Section  6). 

Concurrent  garbage  collection  in  a  list  processing  environment  has  recently  received  much 
attention  (see,  for  example,  [5]  and  [13]).  The  problem  considered  in  this  appendix  is 
different  in  that  the  safety  of  collecting  a  garbage  node  depends  upon  whether  or  not  it  is 
reachable  by  any  existing  or  future  search;  knowing  that  the  node  is  not  reachable  from  the 
root  does  not  ensure  the  safety  of  collecting  the  node.  The  idea  of  the  method  desribed  in 
this  appendix  is  to  use  reference  counts  to  guarantee  that  only  blue  nodes  that  are  no  longer 
reachable  by  any  existing  or  future  search  will  be  garbage  collected.  Note  that  in  a  tree 
there  are  no  cycles;  so  we  don't  encounter  some  of  the  problems  of  using  reference  count 
schemes  in  general  list  processing.  We  assume  that  in  addition  to  the  usual  fields,  each  node 
also  has  two  reference  counters;  node.ref[0]  and  node.reffl],  and  one  index  (or  indicator) 
field,  node.index,  to  designate  which  counter  is  currently  in  use  for  that  node.  The  field 
node.index  can  take  either  0  or  1  as  a  value  lo  designate  one  of  the  two  reference  counts. 
We  further  assume  that  interchanging  between  these  two  values  (denoted  "comp"  for 
complement)  is  an  indivisible  operation  and  that  incrementing  and  decrementing  reference 
counts  (denoted  "inc"  and  "dec")  can  also  be  done  indivisibly. 


1.2.  The  System 

In  this  section,  we  demonstrate  the  modifications  to  the  procedures  given  above  that  will 
allow  concurrent  garbage  collection  as  described. 

The  search  process  used  in  previous  sections  must  be  redefined  to  handle  updates  for 
reference  counts.  For  any  node  n,  n.ref[0],  n.ref[l]  and  n.index  are  initially  set  to  zero  by 
the  create  procedure. 
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The  Search  Process 

Search:  This  procedure  searches  for  a  node  in  the  tree  with  a  given  value,  v. 

procedure  search(tz) 
j_ROOT.index; 

inc  ROOT.ref[j];  /*Reference  ROOT*/ 

(f,dir)_find{ROOT,j,i/)i 

s_f.dir; 

if  sdX  then  print  "Value  v  is  at  node  s" 
else  print  "Value  v  is  not  in  the  tree"  fi\ 
unlock(f); 

The  procedure  find(n,i,i/)  is  redefined  as  follow;.  When  the  procedure  is  called,  t.ref[i]  has 
already  been  incremented  by  the  search  process  which  calls  the  procedure. 

Find:  Recursively  performs  the  search,  modifying  reference  counts  as  appropriate. 

procedure  find(n,i,t») 
f_n; 

if  v  <  f.value  then  dir  _  left  else  dir  _  right  fr, 

s_f.dir;  /*Find  son*/ 

if  sPX  and  s.value  / v  then 

{j_s.index;  inc  s.ref[j]};  /*Reference  son*/ 

/♦(Operations  inside  {......}  assumed  indivisible.)*/ 

dec  f.ref[i];  f*Then  dereference  father*/ 

return  find(s,j,v)  /*Next  level*/ 

else 

lock(f); 

if  f  is  blue  then 
unlocMfV, 

t_f.back;  /*Get  pointer  to  back  son*/ 

{j_t.indexj  inc  t.reffjj};  /*Referencc  it*/ 

dec  f.ref[i];  /*Then  dereference  this  one*/ 

return  find(f.back,j,i/)  /*Follow  back  pointer*/ 

else 

i/s^f.dir  then 


unlock(f); 

/♦Lost  it*/ 

return  find(f,i,v) 

else 

/♦Found  it*/ 

dec  f.ref[ij; 

returnd  ,d\r) 

fi 

fi 

fi 


The  new  version  of  the  insert  procedure  follows: 
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Tho  Insertion  Process 


Insertion:  This  procedure  inserts  a  node  with  value  v  into  the  tree  (at  one  of  the  leaves),  if 
no  such  node  already  exists  in  the  tree. 

procedure  insert(v) 
j_ROOT.index; 

inc  ROOT.ref[j];  /^Reference  ROOT*/ 

(f,dir)_find(ROOT,j,tf); 
if  f.dir^X  then 

print  "Value  u  is  already  in  the  tree" 
unlocMf) 

else 

create(w);  /*Build  a  node*/ 

w.left_\; 

w.right_X; 

w.value_i/; 

f.dir_w;  /*Point  to  it*/ 

unlock(f) 
fi 

The  new  delete  procedure: 

Delete:  This  procedure  deletes  a  node  with  value  u  from  the  tree,  if  such  a  node  exists. 

procedure  delete(iz) 

(f,dir)Jind(ROOT^/); 
if  f.dir  -  X  then 

print  "value  v  not  in  tree" 
unlocMf); 

else 

s_f.dir; 

lock(s); 

deletion-by-rotation(f,dir)  /*Do  the  dirty  work*/ 

The  procedure  deletion-by-rotation  is  modified  as  follows: 

Deletion  by  Rotation: 

procedure  deletiOn-by-rotation(f,dir) 

i_f.index; 

inc  f.ref[i]; 

s_f.dirj 

if  s.left  -  X  then  remove(f,dir,right)  /*End  recursion*/ 

/♦(Note:  Example  of  directional  bias)*/ 

else 

(f^,h)_rotation(f, dir, left);  /*Move  f.dir  down*/ 
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if  h.lef  t  =  X  then 

/♦Oon't  need  to  rebalance  on  last  recurive  call*/ 
delelion-by-rctation(g, right); 
eke  /*Do  recursion  and  rebalance*/ 

deletion-by-rotation(g,right);  /^Recursive  call*/ 

/♦N.B.:  at  this  point,  no  nodes  are  locked*/ 
lock(f);  /*Begin  rebalance*/ 

if  g  i1  (.dir  or  f  is  blue  then 

/♦Can’t  rebalance,  since  things  have  changed*/ 
unlock(f) 

else 

lock(g); 

(f.g\h’)_rotation(f,dir,right); 

unlockfg’); 

unlock(h’) 

fi 

fi 

fi 

dec  f.ref[i]; 

Create-.  This  procedure  creates  a  new  node  named  w,  by  removing  it  from  the  free  list.  It 
also  sets  the  reference  counts  and  index  for  w  to  zero. 

procedure  create(w) 
lock(FREE); 

*/  FREE.Ieft*FREE.right  then 

abort  the  process  which  calls  create  and  inform 
the  system  that  the  free  list  is  empty 

else 

wj  core  if* ft* 

FREE.IeftJFREE.Ieft].right 

fi 

unlock(FREE  Y, 

w.ref[0]_0}  w.ref[l]_0;  w.index_0; 

The  procedures  append,  remove,  and  rotation  are  the  same  as  that  given  above  in  the  main 
part  of  the  paper. 

We  include  in  the  algorithm  below  steps  to  handle  the  multiple  garbage  collectors  case. 
For  this  purpose,  we  require  the  use  of  the  additional  field  for  each  node  mentioned  above: 
the  GC-lock  field.  Garbage  collectors  use  this  lock  to  prevent  confusion  caused  by  switching 
a  reference  count  field  while  another  garbage  collector  is  still  using  it.  There  is  also  a  single 
lock  on  the  entire  GC-queuc;  this  lock  is  also  used  by  the  enqueue  operation  in  the  rotation 
and  remove  procedures.  Moreover,  for  technical  reasons,  in  the  rotation  procedure  we 
enqueue  the  triple  (a,b,c)  rather  than  the  nodes  b  and  c,  and  in  the  remove  procedure  we 
enqueue  the  pair  (a,b)  rather  than  node  b. 
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The  Garbage  Collection  Process 


Garbage-Collect  on  This  process  appends  garbage  nodes  to  the  free  list. 


lock(GC-queue); 
get  (a,b,c)  (or  (a,b)  for 
similar)  from 
lock  a  for  GC; 
unlock(GC -queue); 
a.index_comp  a.index; 
i_comp  a.index; 
while  a.ref[i]  >  0  wait; 
unlock  a  for  GC; 
lock(b); 
unlock(b); 
i_b.index; 

while  b.ref[i]  >  0  wait; 
append(b); 

if  we  got  (a,b)  instead 

lock(c); 

unlock(c); 

i_c.index; 

while  c.reffi]  >  0  wait; 
append(c) 


/*  single  lock  lor  GC-queue*/ 
which  the  algorithm  is 
GC -queue; 

/*  set  GC-lock  field  of  a*/ 

/♦switch  a*/ 
/♦Old  counter*/ 
/♦Let  old  processes  drain*/ 
/♦We're  done  with  it*/ 
/♦Make  sure  no  GC  is*/ 
/♦using  b*/ 

/*be  sure  everyone  done  with  b*/ 
/♦append  b  to  free  list*/ 
of  (a,b,c)  then  return;  /*Done*/ 

/♦Make  sure  no  GC  is*/ 
/♦using  c*/ 

/♦be  sure  everyone  done  with  c*/ 
/♦append  c  to  free  list*/ 


The  procedure  append(n)  was  defined  in  Section  6. 


1.3.  Comments  and  Justification 

Note  the  simplicity  of  the  GC  operation  taken  sequentially  for  collection  of  nodes  b  and  c. 
It  consists  of  switching  the  counter  on  node  a  (thus  searches  arriving  at  a  after  the  switch 
will  increment  the  reference  count  in  the  new  counter),  letting  the  old  processes  "drain"  from 
a,  letting  b  drain,  freeing  b,  and  if  we  have  a  triple  (a,b,c)  then  letting  c  drain  and  freeing  c. 

The  concurrent  garbage  collection  works  simply  because  we  do  the  switch  on  a  after  b 
becomes  garbage.  Then  we  let  all  old  processes  drain  from  a.  After  this  step,  any  process 
which  can  access  a  must  access  it  at  some  time  after  a  enters  the  GC-queue,  which 
guarantees  that  it  accesses  a  after  a  no  longer  points  to  b.  Then  we  simply  have  to  wait  for 
all  old  processes  to  drain  from  b.  This  means  that  b  is  safe  to  free. 

Further,  suppose  we  are  running  concurrent  garbage  collectors  that  might  interfere  with 
each  other.  We  first  observe  that  only  one  tuple  in  the  GC-queue  can  have  any  node,  b,  as 
the  non-first  node.  Otherwise  that  node  would  have  been  deleted  from  the  tree  twice  before 
being  returned  to  the  free  list.  This  is  clearly  impossible  based  on  the  operation  of  the 
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deletion  and  rotation  (and  search)  algorithms. 

Now  suppose  that  two  garbage  collectors  are  working,  say,  on  the  tuples  (a,b)  and  (b,c). 
(The  case  of  ordered  triples  instead  of  ordered  pairs  is  an  easy  generalization.)  Then  we  can 
show  that  it  is  impossible  that  the  tuples  were  removed  from  the  GC-queue  in  the  order: 
(a,b),  (b,c).  This  is,  of  course,  equivalent  to  bring  placed  in  the  queue  in  that  order.  If  this 
were  so,  then  consider  placing  (b,c)  in  the  queue.  Placing  this  tuple  in  the  queue  implies  that 
the  node  c  was  removed  from  the  tree,  and  that  node  b  was  the  father  of  c,  but  was  still  in 
the  free.  However,  at  the  time  that  (b,c)  was  placed  in  the  queue,  (a,b)  is  already  in  the 
queue,  implying  that  node  b  had  previously  been  removed  from  the  tree.  For  (b,c)  to  be 
placed  in  the  queue,  both  b  and  c  must  be  locked;  similarly  for  (a,b).  But  then  after  (a,b)  is 
placed  in  the  queue,  b  must  be  unlocked  before  it  can  be  locked  by  the  process  that  locks  b 
and  c  and  places  (b,c)  in  the  queue.  Therefore,  this  latter  process  locks  a  node  that  has  been 
deleted  from  the  tree.  However,  such  a  lock  (oh  a  blue  node)  is  checked  for,  and  immediately 
released  if  detected.  Therefore,  a  node  that  had  been  deleted  from  the  tree  would  not  be 
the  first  element  in  an  ordered  pair  (triple)  placed  in  the  GC-queue.  This  contradicts  the 
placement  of  (b,c)  in  the  queue  after  (a,b). 

Therefore,  we  know  that  the  tuples  occur  in  the  order:  (b,c),  (a,b). 

Lastly,  we  observe  that  a  garbage  collector,  say  g,  locks  node  b  (from  tuple  (a,b)  or  (a,b,c)) 
to  guarantee  that  other  garbage  collectors  (those  using  it  in  tuples  of  the  form  (b,c)  or 
(b,c,x)),  say  g\  are  done  with  it.  Any  such  g’  would  lock  node  b  while  in  the  critical  section  of 
the  garbage  collector  (protected  by  lock/unlock(GC-queue)).  This  means  that  g  cannot  lock  b 
until  all  tuples  (b,x)  --  that  occur  in  the  queue  before  (a,b)  --  have  been  processed  to  the 
extent  that  they  have  locked  (and  then  unlocked!)  b.Therefore,  node  b  will  not  be  garbage 
collected  by  g  until  it  is  safe  to  do  so. 
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II.  Related  Work 


In  this  appendix,  we  discuss  alternative  solutions  to  the  problem  presented  in  this  paper, 
and  related  work  that  has  been  done.  In  discussing  alternative  solutions,  we  point  out  some 
of  the  advantages  and  disadvantages  of  each. 

Some  relevant  literature.  (This  list  is  by  no  means  complete.)  For  examples  of  the  design 
and  construction  of  multiprocessors  see  Wulf  and  Bell  [24],  and  Swan,  Fuller  and  Siewiorek 
[22].  For  examples  of  verification  methodology,  see  Dijkstra’s  book  [4],  and  the 
comprehensive  survey  by  Manna  and  Waldinger  [17]  (and  its  references).  For  extensions  of 
verification  ideas  to  parallel  programs,  see  the  work  by  Owicki  [19]  and  Lamport  [14].  In  the 
database  systems  area,  research  in  concurrency  and  integrity  control  has  been  done,  for 
example,  by:  Eswaran  ct.aL  [7],  Gray  [8],  and  Ries  and  Stonebreaker  [20]. 

Since  B-trees  (see  Bayer  and  McCreight  [2]  or  Knuth  [10])  have  been  found  convenient  for 
storing  large  amounts  of  data  in  the  sequential  case,  many  database  systems  have  been 
constructed  using  B-trees  (or  often  B*-trees;  see  Wedekind  [23])  as  the  main  data  structure 
(e.g.,  Astrachan,  et.a/.[l]).  These  structures  have  the  advantage  that  they  are  balanced  by 
definition  (although  this  does  not  preclude  the  necessity  of  other  forms  of  reorganization). 
While  we  chose  to  examine  the  structure  of  binary  search  trees,  much  similar  work  on  the 
question  of  concurrent  operations  on  B-trees  and  B*-trees  has  been  done.  We  note, 
however,  that  the  branching  factor  in  most  practical  B-trees  is  such  that  the  number  of  levels 
required  to  store  large  amounts  of  data  rarely  exceeds  four.  This  raises  the  question  of  just 
how  much  concurrency  we  can  squeeze  into  such  a  flat  structure. 

a.  The  first  solution  td"the  concurrent  B-trre  problem  was  advanced  by  Samadi 
[21].  His  approach  is  to  use  semaphores  to  exclusively  lock  the  path  along  which 
modifications  may  take  place,  effectively  locking  the  entire  subtree  of  the  highest 
node  locked. 

b.  Bayer  and  Schkolnick  [3]  improve  upon  this  by  proposing  a  parametrized 
algorithm  for  concurrent  B*-tree  manipulation.  This  algorithm  locks  upper 
sections  of  the  tree  with  writer-exclusion  locks  (which  do  not  lock  out  readers), 
unfit  the  actual  modifications  need  to  be  done  (when  exclusive  locks  are  finally 
applied),  thus  increasing  the  concurrency  of  the  algorithm. 

c.  Miller  and  Snyder  [18]  are  working  on  a  solution  which  locks  a  region  of  the  tree 
of  bounded  size  (which  is  close  to  our  notion  of  locking  a  region  of  constant 
size).  This  locked  region  propagates  up  the  tree,  performing  appropriate 
modifications  to  the  tree  structure. 

d.  Ellis  [6]  presents  a  solution  for  2-3  trees  (generalizable  to  B-trees)  which  uses 
several  methods  to  enhance  concurrency.  These  methods  include  an  application 
of  Lamport’s  idea  for  correctly  reading  and  writing  simultaneously  [15].  The 
algorithms  Ellis  presents  allow  temporary  departures  from  the  tree  structure  in 
order  to  minimize  the  cost  of  maintaining  consistency  during  concurrent 
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operations.  Also,  "relaxing  a  process's  responsibility  to  do  its  own  work"  is  a 
specific  case  of  our  idea  of  postponement-,  the  structural  degradation  caused  by 
one  process  may  be  fixed  later  by  another. 

e.  A  paper  by  Lehman  and  Yao  fl61  will  contain  a  more  extensive  survey  of  these 
ideas,  along  with  a  concurrent  B*-tree  algorithm  that  uses  some  of  the  ideas  in 
the  present  paper  to  achieve  minimal  (constant  size)  locking  and  high 
concurrency. 

Guibas  and  Sedgewick’s  [9]  scheme  for  representing  many  types  of  tree  structures  as 
"dichromatic"  binary  trees  suggests  that  the  problems  of  concurrently  maintaining  more 
general  trees  may  be  reducible  to  the  set  of  problems  studied  in  the  present  paper. 
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III.  A  Correctness  Criterion  for  Concurrent  Search  Systems 

The  binary  search  free  or  any  physical  dafahase  storage  structure  can  be  viewed  as  an 
implementation  of  the  abstract  notion  of  some  data  storage  mechanism.  The  abstract  notion 
specifies  properties  of  various  operations  that  we  wish  to  perform  on  the  database. 

In  this  paper,  we  have  adopted  a  natural  abstract  notion:  the  responses  given  by  the 
search  processes  must  correctly  reflect  the  results  of  the  modifying  operations  on  the 
database.  For  example,  if  the  following  operations  take  place  (shown  in  order  of  termination 
time)  then  the  responses  given  by  the  search  processes  must  be  the  ones  shown  on  the  right 
hand  side. 

inoert(l) 
insert(2) 
search(2) 
delete(2) 
search(l) 
search(2) 

rotate 


That  is,  searchM  returns  the  answer  "Yes"  if  and  only  if  the  number  of  successful  insert(u) 
operations  which  have  terminated  so  far  is  strictly  larger  than  the  number  of  successful 
delete(v)  operations  which  have  terminated.  We  use  the  same  abstract  notion  for  concurrent 
database  systems.  We  say  a  concurrent  system  is  correct  if  if  implements  the  abstract  notion. 

It  is  necessary  to  define  more  precisely  what  we  mean  by  "termination  time  of  a  process" 
in  a  concurrent  environment.  We  define  this  as  the  instant  at  which  an  updating  process 
makes  its  last  modification  to  the  database  link  structure  or  the  instant  at  which  a  query 
process  reports  the  result  of  its  search.  We  can  easily  argue  that  properties  PI  to  P5  are 
sufficient  for  the  correctness  criterion  stated  here.  (Actually,  they  also  guarantee  no 
deadlocks  and  completion  of  all  processes.)  Since  by  P5  a  value  v  will  not  be  added  to  or 
deleted  from  the  tree  without  using  insert{v)  or  delete{v),  respectively,  we  only  need  to  check 
that  a  search(v)  returns  "Yes"  (i.e.,  finds  v)  if  and  only  if  v  is  in  the  tree.  This  is  guaranteed 
by  PI  and  P2;  PI  ensures  that  the  tree  is  always  consistent,  and  thus  by  P2  the  search  will 
find  v  if  and  only  if  it  is  in  the  tree  (since  the  search  process  on  the  "frozen"  tree  is  correct 
in  the  sense  of  assumption  A2). 


response:  "Yes." 

response:  "Yes." 
response:  "No." 
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